separate label
Reviews: Uplift Modeling from Separate Labels
This paper proposes an approach to heterogeneous treatment effect estimation (what it calls "uplift modeling") from separate populations. A simple version of the setup of this paper is as follows. We have two populations, k 1, 2, with different probabilities of treatment conditional on observed features, Pk[T X] (the paper also allows for the case where these need to be estimated). We have access to covariate-outcome pairs (X, Y) drawn from both populations, so we can estimate Ek[Y X]. We assume potential outcomes Y(-1), Y(1), and assume that E[Y(T) X] doesn't depend on setup k. What we would really want is to estimate a conditional average treatment effect tau(x) E[Y(1) - Y(-1) X x].
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Zhou, Jinxing, Guo, Dan, Mao, Yuxin, Zhong, Yiran, Chang, Xiaojun, Wang, Meng
Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less attention. We aim to advance the decoding phase and improve its interpretability. Specifically, we introduce a new decoding paradigm, \underline{l}abel s\underline{e}m\underline{a}ntic-based \underline{p}rojection (LEAP), that employs labels texts of event categories, each bearing distinct and explicit semantics, for parsing potentially overlapping events.LEAP works by iteratively projecting encoded latent features of audio/visual segments onto semantically independent label embeddings. This process, enriched by modeling cross-modal (audio/visual-label) interactions, gradually disentangles event semantics within video segments to refine relevant label embeddings, guaranteeing a more discriminative and interpretable decoding process. To facilitate the LEAP paradigm, we propose a semantic-aware optimization strategy, which includes a novel audio-visual semantic similarity loss function. This function leverages the Intersection over Union of audio and visual events (EIoU) as a novel metric to calibrate audio-visual similarities at the feature level, accommodating the varied event densities across modalities. Extensive experiments demonstrate the superiority of our method, achieving new state-of-the-art performance for AVVP and also enhancing the relevant audio-visual event localization task.
On the Ramifications of Human Label Uncertainty
Zhou, Chen, Prabhushankar, Mohit, AlRegib, Ghassan
In this work, we study the ramifications of human label uncertainty (HLU). Our evaluation of existing uncertainty estimation algorithms, with the presence of HLU, indicates the limitations of existing uncertainty metrics and algorithms themselves in response to HLU. Meanwhile, we observe undue effects in predictive uncertainty and generalizability. To mitigate the undue effects, we introduce a novel natural scene statistics (NSS) based label dilution training scheme without requiring massive human labels. Specifically, we first select a subset of samples with low perceptual quality ranked by statistical regularities of images. We then assign separate labels to each sample in this subset to obtain a training set with diluted labels. Our experiments and analysis demonstrate that training with NSS-based label dilution alleviates the undue effects caused by HLU.
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)